11 research outputs found
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
In this paper, we study the use of deep Transformer translation model for the
CCMT 2022 Chinese-Thai low-resource machine translation task. We first explore
the experiment settings (including the number of BPE merge operations, dropout
probability, embedding size, etc.) for the low-resource scenario with the
6-layer Transformer. Considering that increasing the number of layers also
increases the regularization on new model parameters (dropout modules are also
introduced when using more layers), we adopt the highest performance setting
but increase the depth of the Transformer to 24 layers to obtain improved
translation quality. Our work obtains the SOTA performance in the
Chinese-to-Thai translation in the constrained evaluation
NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering
Hybrid tabular-textual question answering (QA) requires reasoning from
heterogeneous information, and the types of reasoning are mainly divided into
numerical reasoning and span extraction. Despite being the main challenge of
the task compared to extractive QA, current numerical reasoning method simply
uses LSTM to autoregressively decode program sequences, and each decoding step
produces either an operator or an operand. However, the step-by-step decoding
suffers from exposure bias, and the accuracy of program generation drops
sharply with progressive decoding. In this paper, we propose a
non-autoregressive program generation framework, which facilitates program
generation in parallel. Our framework, which independently generates complete
program tuples containing both operators and operands, can significantly boost
the speed of program generation while addressing the error accumulation issue.
Our experiments on the MultiHiertt dataset shows that our model can bring about
large improvements (+7.97 EM and +6.38 F1 points) over the strong baseline,
establishing the new state-of-the-art performance, while being much faster
(21x) in program generation. The performance drop of our method is also
significantly smaller than the baseline with increasing numbers of numerical
reasoning steps
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue
Recent advances in Large Language Models (LLMs) have achieved remarkable
breakthroughs in understanding and responding to user intents. However, their
performance lag behind general use cases in some expertise domains, such as
Chinese medicine. Existing efforts to incorporate Chinese medicine into LLMs
rely on Supervised Fine-Tuning (SFT) with single-turn and distilled dialogue
data. These models lack the ability for doctor-like proactive inquiry and
multi-turn comprehension and cannot always align responses with safety and
professionalism experts. In this work, we introduce Zhongjing, the first
Chinese medical LLaMA-based LLM that implements an entire training pipeline
from pre-training to reinforcement learning with human feedback (RLHF).
Additionally, we introduce a Chinese multi-turn medical dialogue dataset of
70,000 authentic doctor-patient dialogues, CMtMedQA, which significantly
enhances the model's capability for complex dialogue and proactive inquiry
initiation. We define a refined annotation rule and evaluation criteria given
the biomedical domain's unique characteristics. Results show that our model
outperforms baselines in various capacities and matches the performance of
ChatGPT in a few abilities, despite having 50x training data with previous best
model and 100x parameters with ChatGPT. RLHF further improves the model's
instruction-following ability and safety.We also release our code, datasets and
model for further research
Construction of cardiovascular information extraction corpus based on electronic medical records
Cardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is relatively limited, which has hindered further knowledge-based research on this disease. Electronic medical records contain patient data that span the entire diagnosis and treatment process and include a large amount of reliable medical information. Therefore, we collected electronic medical record data related to cardiovascular disease, combined the data with relevant work experience and developed a standard for labeling cardiovascular electronic medical record entities and entity relations. By building a sentence-level labeling result dictionary through the use of a rule-based semi-automatic method, a cardiovascular electronic medical record entity and entity relationship labeling corpus (CVDEMRC) was constructed. The CVDEMRC contains 7691 entities and 11,185 entity relation triples, and the results of consistency examination were 93.51% and 84.02% for entities and entity-relationship annotations, respectively, demonstrating good consistency results. The CVDEMRC constructed in this study is expected to provide a database for information extraction research related to cardiovascular diseases
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Artificial Intelligence (AI), along with the recent progress in biomedical
language understanding, is gradually changing medical practice. With the
development of biomedical language understanding benchmarks, AI applications
are widely used in the medical field. However, most benchmarks are limited to
English, which makes it challenging to replicate many of the successes in
English for other languages. To facilitate research in this direction, we
collect real-world biomedical data and present the first Chinese Biomedical
Language Understanding Evaluation (CBLUE) benchmark: a collection of natural
language understanding tasks including named entity recognition, information
extraction, clinical diagnosis normalization, single-sentence/sentence-pair
classification, and an associated online platform for model evaluation,
comparison, and analysis. To establish evaluation on these tasks, we report
empirical results with the current 11 pre-trained Chinese models, and
experimental results show that state-of-the-art neural models perform by far
worse than the human ceiling. Our benchmark is released at
\url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-us}
Studies on a hybrid way of rules and statistics for Chinese conjunction usages recognition
Conjunction is a kind of functional words. Different conjunctions may contain different usages. The same conjunction may have different usages in different contexts. Studies on conjunction usage recognition are helpful for automatic understanding of modern Chinese texts. This paper adopts a hybrid way of rules and statistics to identify conjunction usages. Experiment results show that the methods combining rules and statistics are helpful for automatic recognition of conjunction usages. Among them, F measure of the participle and part-of-speech tagging corpus of the April , May, June 2000 People\u27 s Daily achieves 91.42%, 90.88%, 90.92% respectively in open test
The Comparative Experimental Study of Multilabel Classification for Diagnosis Assistant Based on Chinese Obstetric EMRs
Obstetric electronic medical records (EMRs) contain massive amounts of medical data and health information. The information extraction and diagnosis assistants of obstetric EMRs are of great significance in improving the fertility level of the population. The admitting diagnosis in the first course record of the EMR is reasoned from various sources, such as chief complaints, auxiliary examinations, and physical examinations. This paper treats the diagnosis assistant as a multilabel classification task based on the analyses of obstetric EMRs. The latent Dirichlet allocation (LDA) topic and the word vector are used as features and the four multilabel classification methods, BP-MLL (backpropagation multilabel learning), RAkEL (RAndom k labELsets), MLkNN (multilabel k-nearest neighbor), and CC (chain classifier), are utilized to build the diagnosis assistant models. Experimental results conducted on real cases show that the BP-MLL achieves the best performance with an average precision up to 0.7413 ± 0.0100 when the number of label sets and the word dimensions are 71 and 100, respectively. The result of the diagnosis assistant can be introduced as a supplementary learning method for medical students. Additionally, the method can be used not only for obstetric EMRs but also for other medical records